Swarm learning (SL) is an emerging promising decentralized machine learning paradigm and has achieved high performance in clinical applications. SL solves the problem of a central structure in federated learning by combining edge computing and blockchain-based peer-to-peer network. While there are promising results in the assumption of the independent and identically distributed (IID) data across participants, SL suffers from performance degradation as the degree of the non-IID data increases. To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants. SL-GAN trains generators and discriminators locally, and periodically aggregation via a randomly elected coordinator in SL network. Under the standard assumptions, we theoretically prove the convergence of SL-GAN using stochastic approximations. Experimental results demonstrate that SL-GAN outperforms state-of-art methods on three real world clinical datasets including Tuberculosis, Leukemia, COVID-19.
translated by 谷歌翻译
连续的关系提取(CRE)要求该模型不断从课堂收入数据流中学习新关系。在本文中,我们提出了一种令人沮丧的简单但有效的方法(FEA)方法,其中有两个学习阶段的CRE:1)快速适应(FA)仅使用新数据加热模型。 2)平衡调整(BT)列出平衡内存数据上的模型。尽管它很简单,但FEA与最先进的基线相比,FEA取得了可比性(在诱人或优越(在少数情况下)性能。通过仔细的检查,我们发现新关系之间的数据失衡会导致偏斜的决策边界在预计编码器上的头部分类器中,从而损害了整体性能。在FEA中,FA阶段释放了后续填充的内存数据的潜力,而BT阶段有助于建立更平衡的决策边界。通过统一的视图,我们,我们发现可以将两个强大的CRE基准列入提议的培训管道中。FEEA的成功还为CRE中的未来模型设计提供了可行的见解和建议。
translated by 谷歌翻译
人类对象相互作用(HOI)识别的关键是推断人与物体之间的关系。最近,该图像的人类对象相互作用(HOI)检测取得了重大进展。但是,仍然有改善视频HOI检测性能的空间。现有的一阶段方法使用精心设计的端到端网络来检测视频段并直接预测交互。它使网络的模型学习和进一步的优化更加复杂。本文介绍了空间解析和动态时间池(SPDTP)网络,该网络将整个视频作为时空图作为人类和对象节点作为输入。与现有方法不同,我们提出的网络通过显式空间解析预测交互式和非相互作用对之间的差异,然后执行交互识别。此外,我们提出了一个可学习且可区分的动态时间模块(DTM),以强调视频的关键帧并抑制冗余帧。此外,实验结果表明,SPDTP可以更多地关注主动的人类对象对和有效的密钥帧。总体而言,我们在CAD-1220数据集和某些ELSE数据集上实现了最先进的性能。
translated by 谷歌翻译
方面情绪三重态提取(Aste)旨在识别目标,他们的情感极化和意见解释句子的情绪。 Aste可以自然地分为3个原子子组织,即目标检测,意见检测和情绪分类。我们认为针对目标 - 意见对的合适的子任务组合,组成特征提取,以及子任务之间的互动将是成功的关键。然而,由于缺陷的子任务制定,子最优特征表示或缺少子任务相互作用,在“一对多”或“多对一”的情况下可能导致不存在的情绪三体,或导出不存在的情绪三元组。在本文中,我们将Aste划分为目标 - 意见联合检测和情绪分类子任务,这与人类认知符合,并且相应地利用序列编码器和表编码器来处理它们。表编码器在令牌对等级提取情绪,从而可以容易地捕获目标和意见之间的组成特征。要在子任务之间建立显式交互,我们利用表格表示来指导序列编码,并将序列功能注入到表编码器中。实验表明,我们的模型在六个受欢迎的ASTE数据集中优于最先进的方法。
translated by 谷歌翻译
方面情绪三重态提取(ASTE)旨在从句子中提取三胞胎,包括目标实体,相关情感极性,以及合理化极性的意见跨度。现有方法缺乏目标 - 意见对之间的构建相关性,并忽略不同情绪三联体之间的相互干扰。为了解决这些问题,我们利用了两阶段框架来增强目标和意见之间的相关性:在阶段,通过序列标记提取目标和意见;然后,我们附加了一组名为可感知对的人工标签,其指示特定目标意义元组的跨度,输入句子以获得更接近相关的目标意见对表示。同时,我们通过限制令牌的注意力领域来降低三态层之间的负干扰。最后,根据可感知对的表示来识别极性。我们对四个数据集进行实验,实验结果表明了我们模型的有效性。
translated by 谷歌翻译
深度学习技术导致了通用对象检测领域的显着突破,近年来产生了很多场景理解的任务。由于其强大的语义表示和应用于场景理解,场景图一直是研究的焦点。场景图生成(SGG)是指自动将图像映射到语义结构场景图中的任务,这需要正确标记检测到的对象及其关系。虽然这是一项具有挑战性的任务,但社区已经提出了许多SGG方法并取得了良好的效果。在本文中,我们对深度学习技术带来了近期成就的全面调查。我们审查了138个代表作品,涵盖了不同的输入方式,并系统地将现有的基于图像的SGG方法从特征提取和融合的角度进行了综述。我们试图通过全面的方式对现有的视觉关系检测方法进行连接和系统化现有的视觉关系检测方法,概述和解释SGG的机制和策略。最后,我们通过深入讨论当前存在的问题和未来的研究方向来完成这项调查。本调查将帮助读者更好地了解当前的研究状况和想法。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译